Link Clustering with Extended Link Similarity and EQ Evaluation Division
نویسندگان
چکیده
Link Clustering (LC) is a relatively new method for detecting overlapping communities in networks. The basic principle of LC is to derive a transform matrix whose elements are composed of the link similarity of neighbor links based on the Jaccard distance calculation; then it applies hierarchical clustering to the transform matrix and uses a measure of partition density on the resulting dendrogram to determine the cut level for best community detection. However, the original link clustering method does not consider the link similarity of non-neighbor links, and the partition density tends to divide the communities into many small communities. In this paper, an Extended Link Clustering method (ELC) for overlapping community detection is proposed. The improved method employs a new link similarity, Extended Link Similarity (ELS), to produce a denser transform matrix, and uses the maximum value of EQ (an extended measure of quality of modularity) as a means to optimally cut the dendrogram for better partitioning of the original network space. Since ELS uses more link information, the resulting transform matrix provides a superior basis for clustering and analysis. Further, using the EQ value to find the best level for the hierarchical clustering dendrogram division, we obtain communities that are more sensible and reasonable than the ones obtained by the partition density evaluation. Experimentation on five real-world networks and artificially-generated networks shows that the ELC method achieves higher EQ and In-group Proportion (IGP) values. Additionally, communities are more realistic than those generated by either of the original LC method or the classical CPM method.
منابع مشابه
Link Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملA Link Prediction Method Based on Learning Automata in Social Networks
Nowadays, online social networks are considered as one of the most important emerging phenomena of human societies. In these networks, prediction of link by relying on the knowledge existing of the interaction between network actors provides an estimation of the probability of creation of a new relationship in future. A wide range of applications can be found for link prediction such as electro...
متن کاملProviding a Link Prediction Model based on Structural and Homophily Similarity in Social Networks
In recent years, with the growing number of online social networks, these networks have become one of the best markets for advertising and commerce, so studying these networks is very important. Most online social networks are growing and changing with new communications (new edges). Forecasting new edges in online social networks can give us a better understanding of the growth of these networ...
متن کاملPerformance Analysis of ZF Pre-Coder and Extended Joint Channel Estimation Method in Downlink CDMA with HPA
In the down link scenario of code division multiple access (CDMA) systems, multi user detectors (MUDs) such as linear de-correlating detector (LDDs) provide satisfactory symbol error rates (SERs) at the expense of much increased complexity, they require all active users’ spreading sequences, which is impractical from privacy point of view. To overcome this impracticality, a simple matched filte...
متن کامل